Runtime Optimization of Join Location in Parallel Data Management Systems
نویسندگان
چکیده
Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both may be suboptimal: reduce side joins may cause skew, while map side joins may lead to a lot of data being transferred and replicated. In this paper, we present techniques to make runtime decisions between the two options on a per key basis, in order to improve the throughput of the join, accounting for UDF computation if any. Our techniques are based on an extended ski-rental algorithm and provide worst-case performance guarantees with respect to the optimal point in the space considered by us. Our techniques use load balancing taking into account the CPU, network and I/O costs as well as the load on compute and storage nodes. We have implemented our techniques on Hadoop, Spark and the Muppet stream processing engine. Our experiments show that our optimization techniques provide a significant improvement in throughput over existing techniques.
منابع مشابه
A Multi Objective Optimization Model for Redundancy Allocation Problems in Series-Parallel Systems with Repairable Components
The main goal in this paper is to propose an optimization model for determining the structure of a series-parallel system. Regarding the previous studies in series-parallel systems, the main contribution of this study is to expand the redundancy allocation parallel to systems that have repairable components. The considered optimization model has two objectives: maximizing the system mean time t...
متن کاملA Multi-objective Model for Location of Transfer Stations: Case Study in Waste Management System of Tehran
This paper presents a multi-objective optimization model for the design of a waste management system consisting of customers, transfer stations, landfills and collection vehicles. The developed model aims to simultaneously minimize the total costs, greenhouse gas emissions and the rates of energy consumption. To tackle the multiple objectives in the problem, we utilize an interactive fuzzy prog...
متن کاملDeveloping a bi-objective optimization model for solving the availability allocation problem in repairable series–parallel systems by NSGA II
Bi-objective optimization of the availability allocation problem in a series–parallel system with repairable components is aimed in this paper. The two objectives of the problem are the availability of the system and the total cost of the system. Regarding the previous studies in series–parallel systems, the main contribution of this study is to expand the redundancy allocation problems to syst...
متن کاملParallelizing query optimization
Many commercial RDBMSs employ cost-based query optimization exploiting dynamic programming (DP) to efficiently generate the optimal query execution plan. However, optimization time increases rapidly for queries joining more than 10 tables. Randomized or heuristic search algorithms reduce query optimization time for large join queries by considering fewer plans, sacrificing plan optimality. Thou...
متن کاملA Robust Optimization Methodology for Multi-objective Location-transportation Problem in Disaster Response Phase under Uncertainty
This paper presents a multi-objective model for location-transportation problem under uncertainty that has been developed to respond to crisis. In the proposed model, humanitarian aid distribution centers (HADC), the number and location of them, the amount of relief goods stored in distribution centers, the amount of relief goods sent to the disaster zone, the number of injured people transferr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2017